Troubleshooting and Optimizing SQL Server 2005 : Data Analysis and Problem Diagnosis

7/16/2011 4:16:09 PM

It is a given that in any computerized environment, things will not be perfect. Machines break down. Programs have bugs. Users make mistakes. Most systems are not optimized from the outset. Imperfections should be expected as part of automation. It is a good thing that in this day and age, tools are available to troubleshoot problems, analyze performance, and implement changes to improve systems. In particular, SQL Server and the Windows operating system provide the mechanisms needed to perform standard database management troubleshooting. Let us begin by looking at the diagnosis and resolution of some common problems.

Tuning the Operating System and Hardware

Much of the performance of any database system relies on the application and database designers. The use of the network, processor, memory, and disk storage can all be dictated by the type of database design and the use of the applications operating against the database server. For this reason, the operating system usually acts as only a starting point in any performance analysis testing. If the hardware configuration is not adequate and the operating system is not properly configured, the database engine will not be able to respond optimally.

The first task that the hardware and operating system serve in any database system is to provide a means to store and operate the database software and objects. The operating system is also responsible for reporting hardware and software problems, as well as making it possible for you to monitor everything executing on the machine.

Creating and Maintaining Statistics

The SQL Server Query Optimizer is the database engine component of SQL Server 2005. As the database engine, it oversees all data-related interaction. It is responsible for generating the execution plans for any SQL operation. In diagnosing a query, the Optimizer must decide on the most efficient means of executing the query and interacting with the database objects.

SQL Server has a cost-based optimizer that can be extremely sensitive to the information provided by statistics. Without accurate and up-to-date statistical information, SQL Server has a great deal of difficulty determining the best execution plan for a particular query.

Often, non-updated statistics cause a system to slow and become unresponsive over time. Periodically checking and using UPDATE STATISTICS on most production systems is warranted.

SQL Server goes through a considerable process when it chooses one execution plan out of several possible methods of executing a given operation. This optimization is one of the most important components of a SQL Server database system. Although the Optimizer’s analysis process incurs some overhead, that overhead is saved in execution.

The Optimizer uses a cost-based analysis procedure. Each possible method of execution has an associated cost, which is determined in terms of the approximate amount of computing resources used in execution. The Query Optimizer must analyze the possible plans and choose the one that has the lowest estimated cost.

It is not uncommon for some complex SELECT statements to have thousands of possible plans. Of course, in such a case, the Optimizer does not analyze every possible combination. It uses a complex series of processes to find a plan that has a cost that is reasonably close to the minimum—a minimum that is only theoretical and unlikely to be achieved.

The SQL Server Optimizer generally makes the best choice for any given execution. It is possible to override the Optimizer’s choices by using code and forcing the execution with hints; however, this is not recommended without significant testing. When a query using hints performs better, you usually implement additional indexing instead of using the hint in production.

Locks, Blocks, and Deadlocks

One of the hallmarks of a true database management system (DBMS) is whether it has the capability to handle more than one user performing simultaneous data modifications. The problem is that when several users in a database make changes, it is likely that they will eventually want to update the same record(s) at the same time. To avoid the problems this would cause, SQL Server and most other DBMSs provide a locking mechanism.

A locking mechanism provides a way to check out a particular row or set of rows from the database so that they cannot be changed by another user until the connection is finished and the changes are made. For connections that are reading data, locking provides a mechanism to prevent other connections from changing the data for the duration of the read or longer. There are two basic types of locks:

Shared locks— A shared lock occurs when a user is trying to read a row of data; for some duration, depending on the transaction isolation level , the user owns a shared lock on the table. Because the user is just trying to read the record, there can be several shared locks on the row, so many people can read the same record at the same time.
Exclusive locks— A user obtains an exclusive lock when he or she needs to change the row. Exclusive locks are not shared; there can be only one user with an exclusive lock on a row at any given time. If a user needs to acquire an exclusive lock to a row that is already locked by another user, lock contention occurs. Some level of contention is normal in a database that is frequently updated. Typically, an application waits for some arbitrary amount of time for the locks to clear and the transaction to complete. This results in an apparent slowdown of the application and the server, and excessive amounts of contention lead to performance degradation and possibly user complaints.

Reducing Lock Contention

There are a few things you can do to reduce lock contention. First, you can make transactions as simple as possible and keep extraneous logic out of transactions. The best case is when you do all the gathering of data and validation of that data outside the transaction, and the transaction is used only to update and insert rows. Second, you should make sure that the application does not have any transactions that wait for user input because users tend to do such things as go to lunch while they have windows open, waiting for them to enter data to complete their transactions.

In general, you should try to collect all the data at once, and then you should start the transaction, make the changes, and commit the changes. You should design applications and databases with concurrency in mind. You should keep tables that are frequently updated small by moving columns that don’t belong in the table or that aren’t changed very often into another table. If a table is going to be updated frequently, you should make sure that it isn’t indexed more than necessary. Data modification statements, such as INSERT, UPDATE, and DELETE, have to change the indexes as they go, so having too many indexes on a table requires them to modify several indexes.

With contention also comes a circumstance known as a deadlock. A deadlock occurs when two processes are locking resources and each one wants the resource that the other has locked. Deadlocks occur when two or more transactions cannot complete because of mutual locks. SQL Server detects a deadlock and rather randomly kills one of the user processes.

Levels of Locks

In addition to using shared and exclusive locks, SQL Server also locks objects at different levels. SQL Server can lock a single row of a table, a single data page, or an entire table.

Typically, SQL Server operates in the page lock mode, in which it locks the data pages being requested. After a certain amount of blocking is noticed, SQL Server slips into a row locking mode, in which single rows are locked.

On the other end of the scale, when a connection attempts to update a certain percentage of a table, SQL Server automatically escalates to a table lock, in which it automatically locks the entire table either exclusively (in the case of a full table update) or in a shared mode (in the case of a full table read). SQL Server also determines lock escalation based on the activity occurring in the table at the time of the lock request. If the activity level is low, SQL Server saves itself some time by escalating the lock sooner so that it has less effect on other users.

There are shared page locks, shared row locks, and shared table locks for reads, along with exclusive page locks, exclusive row locks, and exclusive table locks for writes. Locks and the control of locking behavior are important in DBMSs. To aid in management over a DBMS, you can also implement transactions that directly affect locking behavior. The following section explains how to do this.

Diagnosing Lock Problems

You can diagnose lock, block, and deadlock problems by using the System Monitor (How many are occurring?), Activity Monitor (What is currently occurring?), and SQL Server Profiler (What series of events causes the problem?). A number of stored procedures, dynamic management views or functions, and system tables can also provide information about locking.

Locking-Related Stored Procedures

The sp_lock stored procedure reports snapshot information about locks, including the object ID, index ID, type of lock, and type or resource to which the lock applies. The sp_lock procedure is also a Transact-SQL (T-SQL) alternative that you can use to view lock activity in the Activity Monitor in SQL Server Management Studio. The sp_lock procedure is provided only for backward compatibility and is considered a deprecated function. You may want to query locking information from dynamic management views as an alternative.

Locking-Related Dynamic Management Views

You can query the sys.dm_tran_locks dynamic management view to obtain information about the current state of locking. This is a preferred source of information in SQL Server 2005, although, as with all other dynamic views, it is supported only within the current version of SQL Server.

You can use the sys.dm_exec_sessions view to retrieve information about processes currently running on the server, and you can use sys.dm_exec_requests to get further information about procedures blocking an execution request.

SQL Server Profiler Lock Events

You can use SQL Server Profiler to trap several lock events:

Deadlock Graph Event Class— Provides an XML description of a deadlock.
Lock:Acquired Event Class— Indicates whether a lock has been acquired on a resource.
Lock:Cancel Event Class— Tracks requests for locks that were canceled before the locks were acquired.
Lock:Deadlock Chain Event Class— Is used to monitor when deadlock conditions occur and which objects are involved.
Lock:Deadlock Event Class— Tracks when a transaction has requested a lock on a resource already locked by another transaction, resulting in a deadlock.
Lock:Escalation Event Class— Indicates that a finer-grained lock has been converted to a coarser-grained lock.
Lock:Released Event Class— Tracks when a lock is released.
Lock:Timeout (timeout > 0) Event Class— Tracks when lock requests cannot be completed because another transaction has a blocking lock on the requested resource.
Lock:Timeout Event Class— Tracks when lock requests cannot be completed because another transaction has a blocking lock on the requested resource.

It is a good idea to trap most of these events if you are using SQL Server Profiler as a means of diagnosing locking and deadlocks in applications.

System Monitor Lock Counters

The SQLServer:Locks object provides information on locking. Minimizing locks increases concurrency, which can improve performance. You can monitor multiple instances of the Locks object at the same time. The following are the available SQL Server Locks counters:

Average Wait Time (ms)— Monitors the average time (in milliseconds) for each lock request.
Lock Requests/sec— Monitors the number of new locks and lock conversions per second.
Lock Timeouts (timeout > 0)/sec —Monitors the number of lock requests per second that timed out, excluding NOWAIT locks.
Lock Timeouts/sec— Monitors the number of lock requests per second that timed out, including NOWAIT locks.
Lock Wait Time (ms)— Monitors the total wait time (in milliseconds) for locks.
Lock Waits/sec— Monitors the number of lock requests per second that required the caller to wait.
Number of Deadlocks/sec— Monitors the number of lock requests per second that resulted in deadlock.

Obviously, you want the number of deadlocks per second to be non-existent, but this is the most common counter utilized when monitoring locking behavior.